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Abstract. In this paper we present a scalable pointer analysis for em- 
bedded applications that is able to distinguish between instances of re- 
cursively defined data structures and elements of arrays. The main con- 
tribution consists of an efficient yet precise algorithm that can handle 
multithreaded programs. We first perform an inexpensive flow-sensitive 
analysis of each function in the program that generates semantic equa- 
tions describing the effect of the function on the memory graph. These 
equations bear numerical constraints that describe nonuniform points-to 
relationships. We then iteratively solve these equations in order to ob- 
tain an abstract storage graph that describes the shape of data structures 
at every point of the program for all possible thread interleavings. We 
bring experimental evidence that this approach is tractable and precise 
for real-size embedded applications. 


1 Introduction 

The difficulty of statically computing precise points-to information is a major 
obstacle to the automatic verification of real programs. Recent successes in the 
verification of safety-critical software [BCCUOSj have been enabled in part be- 
cause this class of programs makes a very restricted use of pointer manipulations 
and dynamic memory allocation. There are numerous pointer- intensive applica- 
tions that are not safety-critical yet still require a high level of dependability like 
unmanned spacecraft Eight control, flight data visualization or on-board network 
management for example. These programs commonly use arrays and linked lists 
to store pointers to semaphores, message queues and data packets (for interpro- 
cess communication), partitions of the memory, etc. Existing scalable pointer 
analyses [Ste96.FFSA98,Das00 5 HT01] are uniform, i.e. they do not distinguish 
between elements of arrays or components of recursive data structures and are 
therefore of little help for the verification of these programs. It is the purpose of 
this paper to address the problem of inferring nonuniform points-to information 
for embedded programs. 

Few nonuniform pointer analyses have been studied in the literature. The first 
one has been designed by Deutsch fDeu92.Deu94] and applies to programs with 
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explicit data type annotations. We first redesigned Deutsch’s model in order to 
analyze languages like C in which the type information cannot be trusted to infer 
the shape of a data structure [Ven96.Ven99j. However both approaches rely on 
a costly representation of the aliasing as an equivalence relation between access 
paths, which makes this kind of analysis inapplicable to programs larger than a 
few thousand lines. We therefore designed a new semantic model [Ven02] that is 
both more compact and more expressive than the one based on access paths. The 
interest of the latter approch lies in the representation of dynamic memory allo- 
cation using numerical timestamps, which turns pointer analysis into the classical 
problem of computing the numerical invariants of an arithmetic program. In the 
case of a sequential program, various optimization techniques can be applied that 
break down the complexity of analyzing large arithmetic programs as described 
in [BCC + 02 ; B 00*03]. In the case of multithreaded arithmetic programs how- 
ever, there are no proven techniques that can cope with shared data and thread 
interleaving efficiently and precisely. This is a major drawback knowing that 
most embedded applications are multithreaded. 

In this paper we present a pointer analysis based on the semantic model 
of [Ven02j that can infer nonuniform points-to relations for multithreaded pro- 
grams. From our experience with the verification of real embedded applications 
we observed that collections of objects are usually manipulated in a very reg- 
ular way using simple loops. Furthermore, these loops are generally controlled 
by local scalar variables like an array index or a pointer to the elements of a 
list. It is quite uncommon to find global array indices or lists that are modi- 
fied across function calls. Therefore, the information flowing through this local 
control structure is sufficient in practice to describe exactly the layout of ar- 
rays and the shape of linked data structures. We call it the surface structure 
of a program. In the new model proposed here we first perform a flow-sensitive 
analysis of the surface structure that automatically discovers numerical loop in- 
variants relating array positions and timestamps of dynamically created objects. 
We use these invariants to generate semantic equations that model the effect of 
the function on the memory. We then iteratively solve the system made of the 
semantic equations generated from all functions in the program. A similar ap- 
proach has been applied in [WL02] for improving the precision of inclusion- based 
flow-insensitive pointer analyses. Our model can be seen as a natural extension 
to Andersen's algorithm [And94j in which variables are indexed by integers de- 
noting array positions and timestamps, and inclusion constraints bear numerical 
relations between the indices of variables. We w-ill carry on the presentation of 
the analysis with this analogy in mind. 

The paper is organized as follows. In Sect. 2 we define the base semantic 
model and the surface structure of a C program. The semantics is based on 
timestamps to identify instances of dynamically allocated objects. Section 3 de- 
scribes the abstract interpretation of the surface structure and the inference of 
numerical invariants. In Sect. 4 we show how to generate nonunirorm inclusion 
constraints from the numerical relationships obtained by the analysis of the sur- 
face structure. The iterative resolution of these constraints provides us with a 



global approximation of the memory graph. We describe the implementation of 
an analyzer for the full C language in Sect. 5 and give some experimental re- 
sults from the analysis of a real device driver. We end the paper with concluding 
remarks and future work. 

2 Base Semantic Model 

In [Ven02j we have introduced a semantic model that uniquely identifies instances 
of dynamically allocated objects by using timestamps of the form (Ax , . . . , A n ) 
where the A* are counters associated to each loop enclosing a memory allocation 
command. Consider for example the following piece of code: ' 

Example 1. 

ford = 0; i < 10; i++) 

for(j = 0; j < 3; j++) 

a[i] [j] .ptr = malloc (...); 

In that model we w’ould consiherthe couple (f, J)" as a timestamp”f6f distinguish- 
ing between calls to the malloc command. In this paper we use a simplified model 
which folds all nested loop counters into one. In the previous example, this would 
result into considering the timestamp 3i -fj. This amounts to having one global 
counter A that is incremented whenever the execution crosses a loop and is reset 
to 0 whenever the execution exits an outermost loop. While both models are 
equivalent in uniquely identifying dynamically allocated memory, the loss of in- 
formation about nested loop counters may lead to imprecisions when timestamps 
are represented by abstract numerical lattices [Kax76,CH78,Gra91 J Min01]. This 
is not an issue in embedded applications since almost all loops have constant 
iteration bounds and arrays are traversed in a regular way as in the example 
above. This type of loop invariants can be efficiently and exactly computed by 
using the reduced product [CC79J of the lattices of linear equalities [Kar76J and 
intervals [CC76J for example. 

Because C* allows the programmer to change the layout of a structured block 
via aggressive type casts, using symbolic data selectors like in [Ven02] for repre- 
senting points-to relations is quite challenging (see [CR99] for a detailed discus- 
sion of type casting in C). In our case this would make the analysis overly compli- 
cated since we also have to manage numerical constraints that relate timestamps 
and positions within blocks. We choose a simple solution that consists of using a 
homogeneous byte-based representation of positions within memory blocks. This 
means that a field in a structure is identified by its byte offset from the beginning 
of the structure. As a consequence we must take architecture-dependent charac- 
teristics like alignment and padding into account. Fortunately, most C front-ends 
provide this information for free. In such a model an edge in the points-to graph 
has the form (a, o) O (ab o') where a. s! are addresses of blocks in memory and 
o, o ! are byte offsets within these blocks. 

Our purpose is to abstract a C program into a system of points-to equations 
expressed by inclusion constraints similarly to Andersen’s analysis [And94j. Since 
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n = c 

| n = m -f o 
j n = m * o 
' ! p = 

I P=q+n 


I p = *q 

| *p = q. 

| p = mallocQ 

| while (m < n) do 5i; • • ■ ; s n end 


Fig. 1. Syntax of the core pointer language 


we want to express nonuniform aliasing relationships, we need to assign position 
and timestamp indices to semantic variables and relate them by using numerical 
constraints. For example, we would like to generate an inclusion constraint for 
the piece of code of Example 1 that looks like: 

*(&a |(ixsf o prr )) 2 malloc t w'here i = i A t € [0, 29] 

where s is the size of the structure contained in the two-dimensional array, 
o ptr is the offset of the field ptr in that structure and t is the timestamp 
of the memory allocation statement. In order to infer this kind of constraint 
we must first perform a flow-sensitive analysis over a relational numerical lat- 
tice [Kaj76,CH7S,Gra9i.MinOI] that computes invariants relating loop counters, 
array indices and timestamps. The main difference from [Ven02] comes from the 
fact that we generate inclusion ^constraints without any prior knowledge of the 
layout of objects in the heap. In this case it is not obvious what to do with the 
following piece of code: 

Example 2. 

ford = 0; i < 10; i++) { 
p = p->next ; 

} 

The rest of this section will be devoted to defining a concrete semantic model 
that will allow us to handle this situation simply and precisely. 

We base our semantic specification on a small language that captures the 
core pointer arithmetic of C at the function level. The treatment of interproce- 
dural mechanisms is postponed until Sect. 4 where we wall detail the generation 
of inclusion constraints. We call surface variable a variable which has a scalar 
type, either integer or pointer, and which does not have its address taken. The 
syntax of the language is defined in Fig. 1, where we denote by p,q,r pointer- 
valued surface variables, by m, n, o integer- valued surface variables, and by x, y, z 
all other variables. We assume that the variable on the left handside of an assign- 
ment operation does not appear on the right handside. This will facilitate the 
design of the numerical abstract interpretation in Sect. 3. It is always possible 
to rewrite the program in order to satisfy this assumption. Note that in order 
to keep the presentation simple, we focus on fundamental arithmetic operations 
and loops. All other constructs can be analyzed along, the same lines. We use 


this language to model the computations that occur locally within the body of 
a C function, excluding calls to other functions. A program P in this language 
is just a sequence of statements describing the pointer manipulations performed 
by a function. We provide P with a small-step operational semantics given by a 
transition system (17, — *) defined as follows. 

We first need some notations. We assume that each statement of P is assigned 
a unique label t. If £ is the label of a statement, we denote by next(^) the label of 
the next statement of P to be executed in the natural execution order. If £ is the 
label of a loop we denote by top (£) the predicate that is true iff the statement 
at i is an outermost loop. A state of 17 is a tuple (A, M, p, l) where A is an 
integer denoting the global loop counter used for timestamping, M is a memory 
graph, g is an environment and £ is the label of the next statement to execute. 
A memory graph is a collection of points-to edges (a, o) > (a', o') where a, a' are 
addresses and o, o' are integers representing byte offsets. An address is either 
the location of a global variable &x or a dynamically allocated block blk^t), 
where £ is the location of the allocation statement and £ is a timestamp. We use 
a special address null to represent the MULL pointer value in C. The mapping 
defined by a memory graph is functional, i.e. there is at most one outcoming 
edge for each memory location (a. o } . We denote by M (a, o) the target location 
of the edge originating from the location (a, o) if it exists or (null, 0) otherwise. 
We denote by M[{ a, o } D> (a 7 , o')] the memory graph M which has been updated 
with the edge (a, o) D> (a', o'). 

W r e split down each pointer variable p into two variables p a and p G that re- 
spectively denote the address of the block and the offset within this block to 
which p points. An environment g maps variables n,.p 0 to integers and variables 
p a to addresses. We denote by g[u *- v] the environment o in which the variable 
u has been assigned the value v. Finally, we denote by Q a special element of 
E representing the error state. The transition relation — + of the operational se- 
mantics is then defined in Fig. 2. An initial state in this operational semantics 
assigns arbitrary integer values to surface integer variables and the null memory 
location to surface pointer variables. This amounts to considering integer vari- 
ables as uninitialized and pointers initialized to NULL. For consistency the initial 
value of A should be 0. In our framework an initial state describes the memory 
configuration at the entry of the C function that is modeled by the program P. 

The transition rule for loop exits requires some explanations. The global loop 
counter A is incremented at the end of each loop iteration and decremented when- 
ever the execution steps out of a nested loop. Whether the global loop counter 
is decremented or left unchanged at loop exit has no effect on the uniqueness 
of timestamps. However decrementation is required in order to preserve linear 
relationships between A and byte offsets during the traversal of multidimensional 
arrays. Consider the two nested loops of Example I. We keep the previous no- 
tations and we denote by O the byte offset within a on the leffchand side of 
the assignment. Then, the relation between O and the loop counters is given by 
0 = 3xsxiisx]T Optr- If we use the decrementation rule at loop exit, the 
global loop counter value is given byA = 3xi-rj, hence 0 = s x A 4- o p t r • 
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: n = m + o) — s- (A, M, p[n p(m) 4- £>(o)], next(^)) 

: n = m * o) — ► (A, M, o[n <— g(m) x p(o)], next(^)) 

: p = &rx) — 'f (A. M, p[p 0 -5— 0, p a &x], next(^)) 

:p = qTn) ^ (A, M, g[ p 0 q* 4 p(n), p a — $(qa)] s next(fl) 
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(A, M[(g(p a )> q(Po)) > (p(qa), P(qo))]> £, next(f)) otherwise 


while (m < n) do £? : Si; - • • end) — ► (A, M, q, £') if g(m) < g(n) 

(0, M t g , next(^)) if p(m) > g( n) and top(^) 
{A — 1, M, g, next(^)) otherwise 
end) — y (A + 1, M, p, £' : while (...) do . . . i : end) 


while fm < n) do ... end\ 


Fig. 2. Operational semantics of the core pointer language 


Without this rule A would be equal to 4 x i 4 - j and the relationship between 
the global loop counter and O would be lost, thereby preventing the inference 
of a nonuniform points-to relation. 

This operational semantics is similax to the one described in [Ven02j with 
a simplified timestamping. We need to instrument the semantics by adding an 
intermediate layer between the environment and the memory that keeps track of 
all memory accesses. Whenever a location is retrieved from the memory, we use 
a timestamp to tag it with a unique name that we call an anchor , and we keep 
the binding between this anchor and the actual memory location in a separate 
structure A called the anchorage. The local environment 0 now maps the address 
component of a surface variable p a either to an address that explicitly appears 
in the body of a C function or to an anchor. We call this refined semantics the 
surface semantics . More formally, the surface semantics (E s . — s ) of a program 
P is defined as follows. A extended state of E s is a tuple (A. A. M. g. t) where 
(A .M.oA) € £ and A is an anchorage. An anchor ref^(r) denotes the value 
returned by the execution of a memory read command £ : p = *q at program 
point £ on time t. The anchorage maps an anchor ref^(t) to an actual memory 
location (a ,o). If (a. o) is a location stored in the environment p, a may either 
be an address or an anchor. We define the resolution function get A which maps 
(a, 0 ) to the corresponding memory location as follows: 

{ (null. 0) if a is an anchor and -4(a) = (null, 0) 

(a. o 4- o') if a is an anchor and -4(a) = (a, o') 

(a. o) if a is an address a 

If p is a surface pointer and o is an environment, we denote by get A >g (p) the 
memory location get A (p(p a ), p(p 0 )). The transition relation — ^ of the surface 
semantics is then defined in Fig. 3. The error state in this semantics is also 
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Fig. 3. Surface semantics of the core pointer language 


denoted by Q. An initial state in the surface semantics is simply an initial state 
in the base semantics with an empty anchorage. We denote by I the set of all 
initial states. 

We are interested in the collecting semantics [CouSl] of a program P } that 
is the set C — {% 5 j i G 1} of all states reachable from any initial state L We 

define the surface structure S of P as follows: 

S = {(A, oj) | 3i VI FA : (A, A, M, q , i) € C} 

An element (A. o . f) is called a surface configuration. The program P models 
the pointer manipulations performed by a single C function. Our purpose is to 
compute a global approximation of the memory for a whole C program by first 
performing an abstract interpretation of the surface structure of each function 
in the program. The design of this abstract interpretation is straightforward 
because the surface structure is independent from the data stored in the heap and 
does not interfere with other threads. We will then generate inclusion constraints 
from the results of the analysis of the surface structure that will provide us with 
a global approximation of the memory and the anchorage structure as well. 

3 Abstract Interpretation of the Surface Structure 

We describe the analysis of the surface structure within the framework of Ab- 
stract Interpretation [CC77,CC79 ; Cou81,CC92]. We define an abstract environ- 
ment by a pair { 2 /^, 7 r 3 ) as follows: 

- The component is an abstract numerical relation belonging to a given 
numerical lattice V* [Kar76,CH78 ) Gra91,Min01] that we leave as a param- 
eter of our analysis. The abstract relation 1 / ^ is a collection of numerical 
constraints between ail integer valued variables n, p 0 of the program and a 
special variable A denoting the value of the global loop counter. 

— The component tt* maps every variable p Q to a set of abstract addresses. 

An abstract address is either the address of a global variable &:x, a dynamically 
allocated block blkj(/^} or an anchor refjj(/r}, where /r is a abstract numerical 
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Fig. 4. Abstract surface semantics of atomic statements 


relation between the loop counter variable A and a special timestamp variable 
denoted by r. We assume that for each set of abstract addresses, there is at most 
one abstract address blkj|{/i 3 ) or ref{(/d) per program location t. Therefore, the 
set E* of all abstract environments is isomorphic to the product V 3 of the 
numerical lattice over a fixed family I. We provide with the structure of a 
lattice by lifting all operations of V 3 to E* pointwise. 

The denotation ©ya (z/ 3 ) of an abstract numerical relation is a set of variable 
assignments e that satisfy the numerical constraints expressed by zA If x\ , . . . , x n 
are numerical variables and Vi , . . . , v n are integer values, we denote by is*(x 1 >— ► 
Vi } . . . , x n e-i- v n ) the predicate that is true iff there is an assignment s © 7 vt(^) 
such that e(xi) — v\ for all 1 < i < n. The denotation 7^ (i/ s } 7 r 3 ) of an abstract 
environment is the set of all pairs (A , 0) where A © IN and q is an environment 
of the surface semantics, such that: 

- y^(n h-t p(n), . . . , p 0 1-+ g(p 0 ), . . . , A ^ A) for all variables n, . . . , p, . . . of the 
program 

- of p a ) = fa =>• fa © ?r 3 (p a ) 

- £(p<z) = blk^(t) ©> blk^(/u 3 } © 7r s (p a ) A }J[ r j — > A »-*> A) 

“ p(pa) = ref £ {t) ©> ref*(/r) © 7r 3 (p a ) A /d(r A) 

An abstract surface configuration of the program is a family (z/j, ^\)iehoc{P) of 
abstract environments, one for each location i in the program P considered. We 
provide the set of all abstract surface configurations with a lattice structure by 
pointwise extension of operations from E*. The denotation 7{z/jj, ^\)ieboc{P) of 
ans abstract configuration is the set of all surface configurations (A, p, i) such 
that (X.o) e 7ci( I/ |.’ r |)- • 

Following the methodology of Abstract Interpretation, we must now' define 
the abstract semantics of the language. We first have to define some operations 
on the abstract numerical lattice V 3 . If z/ 3 © V 3 and V is a set of variables, 
we denote by zd © V the abstract numerical relation in which all information 
about variables in V has been lost, and by [zdjy the relation that only keeps 


information for variables in V. If S is a system of arbitrary numerical con- 
straints, we denote by i/* © S an abstract numerical relation representing all 
variable assignments that are in the denotation of v* and that are also solutions 
of S. If v is a variable, we denote by v*[v := v -f c] the operation that con- 
sists of adding the increment c to the value of u. The implementation of these 
operations depends on the abstract numerical lattice considered, and we refer 
the reader to the corresponding papers for more details about the underlying 
algorithms [CC76.Kar76,CH78,Gra91,Min01]. We assign an abstract semantics 
[sp : E* E* to each atomic statement s of the language as defined in Fig, 4. 

If is an abstract environment, we define the result {i/% tt) of the 

operation inc y i as follows: 

- P* = v*[A := A + 1] 

( Szx if W 3 (p a ) = 

- Vp : tt** ( P a) = < blk* {fJ[A :=A + 1]) if 7r*(p a ) = blk|(^) 

{ vef'glfj [A A -f lj) if 7r 4 (p a ) = refj{^) 


We define the operation dec ,4 {i/%m) (resp, resetA(^,7r-)) similarly by substi- 
tuting the operation A := A — l (resp. A := 0) to A A ~ 1. The abstract 
semantics of a program is then given by the least solution of a recursive system 
of semantic equations 


(4.4) = F i ((4>Hb'€Loc (P)) 

where F'i is defined as follows: 

— If £ = next(^) and t* is the location of an atomic statement s, then 
Ft ((ij.d^SLocfP)) = \sf{v\ n n\,) 

~ If t" ; while (in < el) do t : s; ■ ■ - ; 6 f : end, then 


Ft ((4. 4 f)*eLoc(P)) = (4" © ( m < n >'4") u mc,i(4.4) 

— If t = next(£') and i’ : while (m < n) do ... end, then 


Ft 


( . « h. \ f reset 4 (zA 0 {m > a}, irt) 

(^e-beeLociP)) = { deCyi(i/ J, 0 {m > n}; 


if top(^) 
otherwise 


We apply classical fixpoint algorithms based upon iteration sequences with widen- 
ing and narrowing [Cou81,CC92] in order to obtain an upper approximation <S- 
of the least fixpoint of the system. 

Theorem 1. S* is a sound approximation of the surface semantics , i.e. S C 

7 ((4.4)ieLoc(P))- 

For example, consider the following program in our core pointer language that 
fills in an array a of pointers with newly allocated blocks: 



Example 3. 

1: n = 0; 

2: while (n < 10) { 

3: q = &a; 

4: p = q + n; 

5 : r = malloc () ; 

6: *p = r; 

7: n - n + 1 ; 

8 : > 

If we use the lattice of convex poiyhedra [CH78] as the numerical lattice V", 
then the abstract environment obtained after analysis of the surface structure 
at program point 6 is: 

f Pa {& a } \ 

< qa ^ \ 

{ r a f + (blk^r = /1,0 < A < 10)} / 

assuming that pointers occupy four bytes in memory. 

4 Nonuniform Inclusion Constraints 

We now use the analysis of the surface structure to build a global approxima- 
tion of the memory graph. For this purpose we use an extension of Andersen’s 
inclusion constraints [And94] enriched with numerical indices that allow us to 
describe nonuniform points-to relations. The syntax of a nonuniform inclusion 
constraint is the following: 

Csi {X(t) 2 &x -f o, i/*(t, o)) 

| {X{t) 2 blk^(F) -F o, if ) 6)) 

I {X{t) 2 ytf) + oM(trfiO))) 

\ {x{t) 2 *y{t'),v i {t,t 1 )) 

where t, t! ,o are special index variables denoting timestamp and offset values 
and Af, y are set variables. We assume that we are provided with a countable 
collection of set variables. The second component iA of a nonuniform constraint 
is a system of numerical relationships between the index variables appearing in 
the constraint. 

The semantics of a system of nouniform constraints is based upon an abstract 
memory graph. An abstract memory graph M 3 is a set of abstract points-to 
relations 

(a(t, o) E> a r {t f , o'), t ' , o . o')} 


( 0 < n < 10 
A = n 

q 0 = T 0 = 0 
[p 0 =4xn 


where a ? a' are addresses and t. o, o' are special index variables representing 
the timestamps and offsets associated to each address. The abstract numerical 


relation v* expresses numerical constraints between these index variables. The 
set Ms of abstract memory graphs can be provided with the structure of a 
lattice by pointwise extension of the corresponding lattice operations over V s . 
The denotation (A/*) of an abstract memory graph is the set of memory 
graphs such that- the offsets on the points-to edges satisfy the constraints of the 
corresponding abstract edges. A valuation V* of set variables is a set of mappings 

(X (t) ^ a (£') -i- o } v* (£, t 7 . o)} 

where a is an address and t, fyo are numerical index variables. The set Vafi of 
all valuations can similarly be provided with the structure of a lattice. Note that 
in the case of the address of a global &x, the associated timestamp variable does 
not have any meaning and is not related by any numerical constraint. We use a 
uniform notation in order to keep the semantic definitions simple. A valuation 
can be seen as an abstraction of the anchorage structure defined in Sect. 2. The 
semantics [CJ* : Ad* x Vafi M 3 x Vafi of a nonuniform inclusion constraint 
C is defined as follows: 

- [<*(*) 2 4 o. V"«) = (A/ 3 , V- U {(X[t) ~ &c 4 o, S)}) 

- \{X{t) 2 blkc(t') + o,v*)f{MKyt) = (MK V i U{(X(t) w blk^(0+o,^)}) 

- I (X(t) 2 y{t') 4 o, V 3 ) = (M 3 . V 3 U {(X(t) ~ a(t") 4- o" , 

\y* {o" = 0 + I (y(t) ~ a (t") 4 O', n“) e v 3 }) 

- 1{*X (t) 2 y(t'), F 3 ) = (M t U {{a(f, o) t> a'(t', o'), S n v\ n v\) ! 

\X[i) h~r a (*) 4 O. is'i) 4 F 3 A {>’(«) 1 — ‘ a' ( £' ) 4 o', 4 V' 3 . F 3 ) 

- [<*(«) 2 ^)l»(Af*, V*) = (A/ 3 , V 3 U a '(«"') 4 o', M 9 ) ! 

(^(i') ~ a(f') 4 o, i/f) £0 A (a(t",o) > a ' (t"' ,o'),v\) € M 3 A = 
[i / 3 n v\ n ,o'}) 

where we have freely renamed the index variables whenever it was necessary to 
avoid name clashes. A solution of a system S of nonuniform set constraints is a 
couple (MK V'*) which is invariant under the application of fCJ* for any (7 6 5. 

We are interested in the least solution of a system 5 of nonuniform set con- 
straints. We can obtain an approximation of the least solution of 5 by computing 
the limit of the abstract iteration sequence with widening (M*,IAj) n > o defined 
as follows: 


f (^0: fy? ) = (-kw« > —Val* ) 

l = [Ml Vi) V aCl%es(Ml X") 

where (|C , j*)J 6 5 denotes the application of all constraints of 5 in an arbi- 
trary order, and V is the product of the widening operators on Ms and ValK 
This provides us with an effective algorithm for computing an approximate so- 
lution of the system, which is similar to that defined by Andersen [And94]. 
The main difference is the use of a widening operator to enforce convergence 
because some abstract numerical lattices have infinitely increasing chains of 
elements [CC76,CH78,Min01]. Once a post-rbcpoint has been reached using this 



algorithm, we can further refine the result by using a decreasing iteration se- 
quence with narrowing defined in the same way. We observed from our experi- 
ments that an iteration sequence with narrowing is always required in order to 
obtain precise ranges for the timestamp and offset variables. 

We now have to show how to extract nonuniform inclusion constraints from 
the abstract interpretation of the surface semantics. Let 5^ be the abstract sur- 
face semantics of a program P obtained from the analysis described in the previ- 
ous section. We assign a unique pair of set variables (£*, Tie) to each statement 
£ : *q = r or £ : q = *r of P, denoting respectively the points-to sets of the 
lefthand and righthand sides of the assignment. Let g * — (z/y 7T*) be an abstract 
environment, p a pointer variable of P and X a set variable. We denote by 
Cx${$) the collection of nonuniform constraints defined as follows: 

- If &;x € 7r a (p a ), then 

(X(t) D &x + o, iy © {t = A.o = p 0 }J t,o) £ Cx t? (g$) 

- If blkjjy) € 7T"(p a ), then 

(X(i) D hlketf) 4- o, y n /r 0 {r = t\ t = A.o = p 0 }Jt,t',o) € Cx#(q*) 

- If refjjy) € 7r fi (p a ), then 

(X(t) 3 £*(0 + o , y n fj? © {r = t* , t = A, o = PojJt.tCo) £ Cx,p(q^) 

Now, if £ : *p = q is a memory write statement of P and g~ is the abstract 
environment of at £, we generate the constraints: 

C Ct M) u On t , q 'J) U {{ *C e (t ) 2 1Z e (t'), T y S © {t = t'})} 

Similarly, for a memory read statement £ : *p = q we generate the constraints: 

CceM) U CK t J° l ) U {(C e (t) 3 *TZ e (t'), T v , © {t = i'}}} 

We denote by Sp the system of all constraints generated in this way for the pro- 
gram P. Let (iV/p, Vp) be an approximation of the least solution of Sp obtained 
by an abstract iteration sequence as described previously. The abstract memory 
graph Mp is a sound global approximation of the memory graph at every point 
of the program: 

Theorem 2. For all state (A. A, M, p, £) of the collecting semantics C of P, we 
have M € 7,^3 (Mg). 

The pointer analysis problem of [Ven02j has thus been reduced to the simpler and 
more tractable problem of solving a system of nonuniform inclusion constraints. 

We finish this formal description with a brief description of the constraint 
generation for function calls. We associate a special set variable P;(f ) to the 
z-th formal parameter of each function f of a C program. We denote by £Fo(f) 
the variable corresponding to the return value of f. Now consider a function 



call t : p = f (p l; . . . ,p n ). Assuming that we are provided with a collection 
X, fa. . . . , X n of set variables describing the sets of addresses that may flow 
through the return value and the parameters p,Pi, . . . , p n of the function call, 
we generate the following points-to equations: 

r (^i(f) 2 *i,T VS ) 

) V„(f)2^»T va ) 

1^2^o(f),Tv,> 

In other words, function calls are treated uniformly : there are no numerical con- 
straints on the index variables. This is not a problem in practice, since nonuni- 
form behaviours usually take place at the function level in embedded applica- 
tions. We do not detail the analysis of computed calls, which can be easily derived 
from the semantics of the memory read operation p = *q. 

We now illustrate the generation of equations. Consider the small program 
of Example 3 that fills in an array of pointers. The equations generated after the 
surface analysis are the following: 

( (*Ce(t) 2 7 {t=t\0<t< 10}) 

< (£e(i) 2 &a -f o, {0 < o < 4 x £}) 

[ (7 Ze(t) 2 blk 5 {tf) ~o : {t = t / 1 o = 0 1 0<t< 10}) 

After solving these constraints by using an abstract iteration sequence with 
widening, we obtain the following abstract memory graph: 

{((&a, 6) > (blk 5 (i>, o'), {o = 4 x f, o' = 0, 0 < t < 10})} 

which describes the exact shape of the memory althrough the execution of the 
program. 

5 Experimental Evaluation 

We have implemented the static analysis described in this paper for the full 
C language. The analyzer itself consists of 9,000 lines of SML/NJ excluding 
the front-end. We have interfaced the analyzer with the ckit [HOM] C front- 
end winch is also written in SML. We currently use the reduced product of 
the lattice of linear equalities [Kar76] and the lattice of intervals [CC76] for 
expressing numerical constraints. The analyzer first translates the C program 
into an intermediate language in which all expressions and statements have been 
broken down using a 3- address format. We then perform a dependency analysis 
which is used to eliminate all arithmetic operations that are not involved in 
pointer manipulations. This substantially shrinks dowm the size of the code to 
analyze. Whole structure assignment; has not been described in this paper and 
deserves some attention. There are two possible w r ays of handling this construct, 
either by expanding the assignment into a collection of individual assignments to 
the fields of the structure or by analyzing the assignment as an atomic operation. 



The former is made difficult by union types and structure- breaking type casts. 
We chose the latter approach, which requires a straightforward extension of 
nonuniform constraints in order to copy a packet of pointers at once. 

We have applied the analyzer to a real piece of software: an on-board link 
controller. The application contains about 25,000 lines of unprocessed C code. 
It is a pointer intensive program with plenty of loop constructs operating on 
multidimensional arrays of structures. It is quite representative of an average 
size embedded program, which is the main target of our analysis. Very large 
programs like those described in [VB04] are quite unusual. Our analysis is quite 
efficient. It takes 210 seconds to parse the files, construct the abstract surface 
semantics and generate the nonuniform inclusion constraints on a laptop with 
a 900Mhz Intel Pentium and 1Gb of RAM running Linux under VmWare. The 
resolution of these constraints only takes 21 seconds. 

The results show that the analysis does discover nonuniform points-to rela- 
tions. In particular, bidimensional arrays of distinct semaphores, arrays of func- 
tions and tables of preallocated memory blocks for dedicated memory manage- 
ment are exactly described. Surprisingly enough, the analysis uncovered a real . 
bug in this application. While we were reviewing the results of the analysis w^e 
noticed that for some array array 2 of dynamically allocated semaphores, there 
was no linear relationship between the offset and the timestamps in the points-to 
relations. The nonuniform points-to equations gave us instantly the location in 
the program where the array was initialized. The initialization code looks like: 

for (1 = 0; i < 20; i++) 

for Cj =0; j <8; j++) { 
arrayl [i] [j] = semCreate () ; 
array2[j] = semCreate (); 

} 

The first array is properly initialized whereas the second one' is reinitialized 
multiple times, causing a memory leak. It should be noticed that the analysis 
sucessfullv inferred a nonuniform points-to relation for the bidimensional array 
of semaphores. This bug was present from the very first version of the program 
and has never been detected during the IS months the software has been under- 
going testing so far. This is an interesting application of this static analysis as a 
sophisticated typechecker for collections of pointers. 

6 Conclusion 

We have presented a pointer analysis that is able to infer nonuniform points-to 
relationships without the cost of existing flow-sensitive analyses [Deu94,Ven02]. 
The originality of our work is that it conciliates two approaches to pointer anal- 
ysis, abstract interpretation and constraint-based analysis, which are often op- 
posed one to each other. Although we could have expressed the whole analy- 
sis within the framework of Abstract Interpretation [CC95], w~e think that a 


constraint-based presentation is more compact and more intuitive for both un- 
derstanding and implementing the analysis. We have shown on a representative 
case study that our approach is tractable and achieves the expected level of pre- 
cision. Unexpectedly this analysis has been able to detect a subtle initialization 
bug in a real application. It now remains to perform more extensive empirical 
studies and investigate the use of the analysis in a real verification tool. 
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